pacman::p_load(tidyverse, reshape2, ggthemes)Take-home Exercise 1 Part I - Creating Data Visualisation Beyond Default

Context
There are two major residential property market in Singapore, namely public and private housing. Public housing aims to meet the basic need of the general public with monthly household income less than or equal to S$14,000. For families with monthly household income more than S$14,000, they need to turn to the private residential market.
The Task
Assuming the role of a graphical editor of a median company, a minimum two and maximum three of data visualisations are prepared to reveal the private residential market and sub-markets of Singapore for the 1st quarter of 2024.
The Data

To accomplish the task, transaction data of REALIS will be used.
Downloading the Dataset
Access Dataset via SMU e-library

After logging in with SMU credentials, navigate to “Residential” tab

Under Property Types, “Select All”
Under Sale Date, select “2024 Jan” - “2024 Mar”
Click “Search”
Click “Download”
Due to the size of the dataset, it is split into multiple segments. Download all in .csv format

The Designing Tool
The data will be processed using the appropriate tidyverse family of packages and the statistical graphics will be prepared using ggplot2 and its extensions.
Getting Started
Installing and loading the required libraries
Note: Ensure that the pacman package has already been installed.
The code chunk below loads the following packages using uses p_load() of pacman package:
tidyverse: (i.e. readr, tidyr, dplyr, ggplot2, lubridate) for performing data science tasks such as importing, tidying, and wrangling data, as well as creating graphics based on The Grammar of Graphics
reshape2 for transforming data between wide and long formats
ggthemes: provides some extra themes, geoms, and scales for ‘ggplot2’.
Importing the Data
The data has been split into multiple .csv files
list.files()list all CSV files in the specified directory.After looping through each CSV file, read it into a data frame using
read_csv(), and store it in a list.bind_rows()combines all data frames in the list into a single big data frame.
csv_directory <- "data/"
csv_files <- list.files(csv_directory, pattern = "\\.csv$", full.names = TRUE)
realis <- list()
for (file in csv_files) {
realis[[file]] <- read_csv(file)
}realis_all <- bind_rows(realis)View Data
names()function prints the names of the columns in the tibble data frame.glimpse()function gives a quick overview of the tibble data frame
col_names <- names(realis_all)
col_names [1] "Project Name" "Transacted Price ($)"
[3] "Area (SQFT)" "Unit Price ($ PSF)"
[5] "Sale Date" "Address"
[7] "Type of Sale" "Type of Area"
[9] "Area (SQM)" "Unit Price ($ PSM)"
[11] "Nett Price($)" "Property Type"
[13] "Number of Units" "Tenure"
[15] "Completion Date" "Purchaser Address Indicator"
[17] "Postal Code" "Postal District"
[19] "Postal Sector" "Planning Region"
[21] "Planning Area"
glimpse(realis_all)Rows: 26,806
Columns: 21
$ `Project Name` <chr> "THE REEF AT KING'S DOCK", "URBAN TREASU…
$ `Transacted Price ($)` <dbl> 2317000, 1823500, 1421112, 1258112, 1280…
$ `Area (SQFT)` <dbl> 882.65, 882.65, 1076.40, 1033.34, 871.88…
$ `Unit Price ($ PSF)` <dbl> 2625, 2066, 1320, 1218, 1468, 1767, 1095…
$ `Sale Date` <chr> "01 Jan 2023", "02 Jan 2023", "02 Jan 20…
$ Address <chr> "12 HARBOURFRONT AVENUE #05-32", "205 JA…
$ `Type of Sale` <chr> "New Sale", "New Sale", "New Sale", "New…
$ `Type of Area` <chr> "Strata", "Strata", "Strata", "Strata", …
$ `Area (SQM)` <dbl> 82.0, 82.0, 100.0, 96.0, 81.0, 308.7, 42…
$ `Unit Price ($ PSM)` <dbl> 28256, 22238, 14211, 13105, 15802, 19015…
$ `Nett Price($)` <chr> "-", "-", "-", "-", "-", "-", "-", "-", …
$ `Property Type` <chr> "Condominium", "Condominium", "Executive…
$ `Number of Units` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ Tenure <chr> "99 yrs from 12/01/2021", "Freehold", "9…
$ `Completion Date` <chr> "Uncompleted", "Uncompleted", "Uncomplet…
$ `Purchaser Address Indicator` <chr> "HDB", "Private", "HDB", "HDB", "HDB", "…
$ `Postal Code` <chr> "097996", "419535", "269343", "269294", …
$ `Postal District` <chr> "04", "14", "27", "27", "28", "19", "10"…
$ `Postal Sector` <chr> "09", "41", "26", "26", "79", "54", "27"…
$ `Planning Region` <chr> "Central Region", "East Region", "North …
$ `Planning Area` <chr> "Bukit Merah", "Bedok", "Yishun", "Yishu…
The dataset used for analysis was prepared by Prof. Kam and contains data beyond the required first quarter (Q1) of 2024. If you downloaded the data directly from Realis and applied the correct filters according to the “Downloading the Dataset” section, your data should be limited to the period of Q1 2024.
realis_all contains:
Public and Private residential property transaction data from 1st January 2023 to 31st March 2024.
There are 26806 rows and 21 columns.
Data Preparation
The task only requires data from the private residential market and sub-markets of Singapore for the 1st quarter of 2024.
Standardise Date Format
The “Sales Date” column is currently a cha type. It needs to be converted into date format.
dmy() is a function from the lubridate package that converts character strings to date format in the day-month-year (DMY) order.
realis_all$`Sale Date` <- dmy(realis_all$`Sale Date`)View Data
head(realis_all$`Sale Date`)[1] "2023-01-01" "2023-01-02" "2023-01-02" "2023-01-02" "2023-01-03"
[6] "2023-01-03"
Keep Relevant Rows
Filter and keep only rows that:
- Sales Date occur within Q1 2024 i.e. between 01 Jan 2024 to 31 Mar 2024 inclusive.
In addition, duplicate and empty rows are also removed.
q1_pte_raw <- realis_all %>%
filter(`Sale Date` >= as.Date("2024-01-01") &
`Sale Date` <= as.Date("2024-03-31")) %>%
distinct() %>%
drop_na()View Data
glimpse(q1_pte_raw)Rows: 4,902
Columns: 21
$ `Project Name` <chr> "THE LANDMARK", "POLLEN COLLECTION", "SK…
$ `Transacted Price ($)` <dbl> 2726888, 3850000, 2346000, 2190000, 1954…
$ `Area (SQFT)` <dbl> 1076.40, 1808.35, 1087.16, 807.30, 796.5…
$ `Unit Price ($ PSF)` <dbl> 2533, 2129, 2158, 2713, 2453, 2577, 838,…
$ `Sale Date` <date> 2024-01-01, 2024-01-01, 2024-01-01, 202…
$ Address <chr> "173 CHIN SWEE ROAD #22-11", "34 POLLEN …
$ `Type of Sale` <chr> "New Sale", "New Sale", "New Sale", "New…
$ `Type of Area` <chr> "Strata", "Land", "Strata", "Strata", "S…
$ `Area (SQM)` <dbl> 100.0, 168.0, 101.0, 75.0, 74.0, 123.0, …
$ `Unit Price ($ PSM)` <dbl> 27269, 22917, 23228, 29200, 26405, 27741…
$ `Nett Price($)` <chr> "-", "-", "-", "-", "-", "-", "-", "-", …
$ `Property Type` <chr> "Condominium", "Terrace House", "Apartme…
$ `Number of Units` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ Tenure <chr> "99 yrs from 28/08/2020", "99 yrs from 0…
$ `Completion Date` <chr> "Uncompleted", "Uncompleted", "Uncomplet…
$ `Purchaser Address Indicator` <chr> "Private", "N.A", "HDB", "N.A", "Private…
$ `Postal Code` <chr> "169878", "807233", "469657", "118992", …
$ `Postal District` <chr> "03", "28", "16", "05", "21", "21", "28"…
$ `Postal Sector` <chr> "16", "80", "46", "11", "59", "58", "79"…
$ `Planning Region` <chr> "Central Region", "North East Region", "…
$ `Planning Area` <chr> "Outram", "Serangoon", "Bedok", "Queenst…
q1_pte_raw contains:
Private residential property transaction data from 1st January 2024 to 31st March 2024
There are 4902 rows and 21 columns.
Keep Relevant Columns
Not all 21 columns will be used for analysis e.g. irrelevant, contains overlapping information as another column. Only relevant columns will be kept.
Columns to drop:
Type of Area: Not used in analysis
Area (SQM): Similar information as Area (SQFT)
Unit Price ($ PSM): Similar information as Unit Price ($ PSF)
Nett Price ($): Similar information as Transacted Price ($)
Purchaser Address Indicator: Not used in analysis
Postal District and Postal Sector: Overlapping information as Postal Code
Columns to be dropped can be specified by prefixing the column names with a minus sign (-) when using the select() function from the dplyr package.
q1_pte <- q1_pte_raw %>%
select(
-`Type of Area`,
-`Area (SQM)`,
-`Unit Price ($ PSM)`,
-`Nett Price($)`,
-`Purchaser Address Indicator`,
-`Postal District`,
-`Postal Sector`
)View Data
glimpse(q1_pte)Rows: 4,902
Columns: 14
$ `Project Name` <chr> "THE LANDMARK", "POLLEN COLLECTION", "SKY EDEN@…
$ `Transacted Price ($)` <dbl> 2726888, 3850000, 2346000, 2190000, 1954000, 34…
$ `Area (SQFT)` <dbl> 1076.40, 1808.35, 1087.16, 807.30, 796.54, 1323…
$ `Unit Price ($ PSF)` <dbl> 2533, 2129, 2158, 2713, 2453, 2577, 838, 1513, …
$ `Sale Date` <date> 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01…
$ Address <chr> "173 CHIN SWEE ROAD #22-11", "34 POLLEN PLACE",…
$ `Type of Sale` <chr> "New Sale", "New Sale", "New Sale", "New Sale",…
$ `Property Type` <chr> "Condominium", "Terrace House", "Apartment", "A…
$ `Number of Units` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Tenure <chr> "99 yrs from 28/08/2020", "99 yrs from 09/12/20…
$ `Completion Date` <chr> "Uncompleted", "Uncompleted", "Uncompleted", "U…
$ `Postal Code` <chr> "169878", "807233", "469657", "118992", "598444…
$ `Planning Region` <chr> "Central Region", "North East Region", "East Re…
$ `Planning Area` <chr> "Outram", "Serangoon", "Bedok", "Queenstown", "…
q1_pte contains:
Private residential property transaction data from 1st January 2024 to 31st March 2024
There are 4902 rows and 14 columns.
Visualisation
There are various types of Properties for Private residences.
unique(q1_pte$`Property Type`)[1] "Condominium" "Terrace House" "Apartment"
[4] "Executive Condominium" "Semi-Detached House" "Detached House"
The different types in the dataset are:
Condominium
Terrace House
Apartment
Executive Condominium
Semi-Detached House
Detached House
Price Distribution
A box plot is used to visualize the distribution of transacted prices for each private residence property type using geom_boxplot(). This type of plot shows five summary statistics: the median, two hinges (interquartile range), and two whiskers (extreme data points within 1.5 times the IQR). Additionally, all individual “outlier” points outside the whiskers are displayed, providing insight into the spread and variability of prices across different property types.
For better visibility, labels parameter with the function scales::number, formats the y-axis labels to include thousands separators.

box_plot <- ggplot(data = q1_pte,
aes(x = `Property Type`, y = `Transacted Price ($)`)) +
geom_boxplot() +
labs(title = "Box Plot of Transacted Price ($) by Property Type",
x = "Property Type",
y = "Transacted Price ($)") +
scale_y_continuous(labels = scales::number) +
theme_gray() + theme(axis.text.x = element_text(angle = 45,
hjust = 1,
vjust=1))
box_plotThere are six private property types in the dataset.
The boxplot shows Detached Houses generally have the highest transacted prices. The Q1, median, and Q3 values for this property type are all substantially higher than those of other types, indicating its premium market status. Detached Houses also exhibit the highest variability in transacted prices, although there are relatively few outliers.
In contrast, both Executive Condominiums and Terrace Houses demonstrate the least variability in transacted prices, with their interquartile ranges (IQR) closely aligning with their medians. This suggests a stable and consistent pricing trend for these property types.
Executive Condominiums have the lowest transacted prices, indicating their appeal as an affordable option within the private housing market.
Apartments and Condominiums, on the other hand, show a significant number of outliers in their transacted prices, suggesting a wider range of pricing and potentially more diversity in market conditions for these property types.
Choropleth Map
A choropleth map is useful for visualizing the variation in average property prices across different areas in Singapore. By shading each region according to its average property price, areas with higher or lower prices can be identified, thus revealing patterns and trends in property values across the city-state. This type of map can reveal hotspots of activity and areas with more affordable or expensive property options.
Two data sets will be used to create the map. They are:
MPSZ-2019: This data provides the sub-zone boundary of URA Master Plan 2019. It can be downloaded at data.gov.sg It consists of the geographical boundary of Singapore at the planning subzone level. The data is based on URA Master Plan 2019.
Private residential property transaction data from 1st January 2024 to 31st March 2024 in tibble data frame (i.e.
q1_pte).
The code chunk below loads the following packages:
tmap: for thematic mapping
sf: for geospatial data handling
httr: Make HTTP requests and handles web APIs
future: Allows sequential and parallel processing
furrr: combine purrr’s family of mapping functions (within tidyverse) with future’s parallel processing capabilities
pacman::p_load(tmap,sf,httr, future, furrr)Geospatial Data
![]() |
![]() |
Import Geospatial Data
The code chunk below uses the st_read() function of sf package to import MPSZ-2019 shapefile into R as a simple feature data frame called mpsz.
mpsz <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`C:\lnealicia\ISSS608\Take-home_Ex\Take-home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
Add coordinates
Link postal code to their relevant coordinates. Empty rows are removed. Not all postal codes may have relevant sales during Q1 of 2024.
q1_pte_coor <- q1_pte %>%
left_join(
found %>% select(results.POSTAL, results.LATITUDE, results.LONGITUDE),
by = c("Postal Code" = "results.POSTAL")
) %>%
rename(
Latitude = results.LATITUDE,
Longitude = results.LONGITUDE
) %>%
filter(!is.na(Longitude) & !is.na(Latitude))Convert to a sf tibble data frame
q1_pte_sf <- st_as_sf(q1_pte_coor,
coords = c("Longitude", "Latitude"),
crs =4326) %>%
st_transform(crs = 3414)Keep relevant columns
q1_pte_sf_plot <- q1_pte_sf %>%
select(-`Project Name`,
-`Sale Date`,
-`Address`,
-`Type of Sale`,
-`Tenure`,
-`Completion Date`)Average Transacted Price by Planning Area
avg_txn_px <- q1_pte_sf_plot %>%
group_by(`Planning Area`) %>%
summarize(
Avg_Transacted_Price = mean(`Transacted Price ($)`, na.rm = TRUE)
)
avg_txn_px <- avg_txn_px %>%
mutate(`Planning Area` = toupper(`Planning Area`))
avg_txn_px <- st_drop_geometry(avg_txn_px)Combine avg_txn_px and mpsz
Populates the average transacted price of each planning area into mpsz sf data frame. Not every planning area may have transaction data, so empty rows are dropped.
mpsz_avg_txn_px <- mpsz %>%
left_join(
avg_txn_px,
by = c("PLN_AREA_N" = "Planning Area")
) %>%
drop_na()Plotting Choropleth Map and Geographical Distribution of Private Properties
The overall plot shows both the average transacted price of the planning area and the distribution of private residences. If viewing the data separately is preferred, click on the relevant tabs.
tmap_mode("view")
map <- tm_shape(mpsz_avg_txn_px) +
tm_polygons(col = "Avg_Transacted_Price",
palette = "YlOrRd",
alpha = 0.3,
style = "quantile",
n = 7) +
tmap_options(check.and.fix = TRUE) +
tm_shape(q1_pte_sf_plot) +
tm_dots(col = "Property Type") +
tm_view(set.zoom.limits = c(11,14))
maptmap_mode("plot")tmap_mode("view")
map2 <- tm_shape(mpsz_avg_txn_px) +
tm_polygons(col = "Avg_Transacted_Price",
palette = "YlOrRd",
alpha = 0.3,
style = "quantile",
n = 7) +
tmap_options(check.and.fix = TRUE) +
tm_view(set.zoom.limits = c(11,14))
map2tmap_mode("plot")tmap_mode("view")
map3 <- tm_shape(mpsz_avg_txn_px) +
tm_polygons() +
tm_shape(q1_pte_sf_plot) +
tm_dots(col = "Property Type") +
tm_view(set.zoom.limits = c(11,14))
map3tmap_mode("plot")tmap_mode(“view”):
Designed for interactive viewing of spatial data.
Enables zooming, panning, and other interactive features, making it easier to explore the data in detail.
tmap_mode(“plot”):
Used for static plotting of spatial data that can be saved as static images e.g., PNG
Does not support interactions like zooming and panning.
Setting the mode to plot after the map is generated saves on resources and allows the code to run faster, as the plot does not require continuous updating.
The plots reveal that of the private properties sold in Q1 2024, areas with the highest average transaction price are concentrated around the Central Region, including areas like Bukit Timah and Newton. This is likely due to their proximity to the Central Business District (CBD). Although there are very few transactions in Sentosa and Changi, these areas still have high average transaction prices, mainly consisting of condominiums.
Sales of Executive Condominiums, which were previously noted to have the lowest overall transaction price and low transaction price variability, are predominantly located in the outskirts of Singapore, such as Woodlands and Bukit Batok. This suggests a trend toward more affordable options in these areas.
Apartments and condominiums constitute the majority of private residences sold and are distributed throughout the city-state. This extensive distribution may account for the significant number of outliers observed earlier in transaction prices.

